-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-5953][VL] Prevent pushdown filters with unsupported data types to scan node #5954
Conversation
Do you mean these fields are not supported in filter pushdown?
|
@FelixYBW Before this PR, Gluten will try to pushdown them, but throw BTW, |
e870cd1
to
65614a1
Compare
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
"""select * from t where binary_field = '3.14'""".stripMargin | ||
)(checkGlutenOperatorMatch[FileSourceScanExecTransformer]) | ||
runQueryAndCompare( | ||
"""select * from t where timestamp_field = current_timestamp()""".stripMargin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have made some updates on the timestamp support. Could you help check if the pushdown of timestamp is supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it's already supported in velox side. But as for now, Gluten can still fallback since mapToFilters
doesn't support yet. I think we can support it in next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Would you like to open an issue to track its support? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is the issue #6642
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
65614a1
to
5ba1899
Compare
Run Gluten Clickhouse CI |
I think we should also prevent pushdown filters from singularOrLists. |
@jiangjiangtian maybe yes, but since there is already type check, I'm not quite sure. https://github.com/apache/incubator-gluten/blob/main/cpp/velox/substrait/SubstraitToVeloxPlan.cc#L2469 . I think it can be in another PR if needed? |
case TypeKind::BOOLEAN: | ||
case TypeKind::VARCHAR: | ||
case TypeKind::ARRAY: | ||
case TypeKind::MAP: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like most type are supported. Would you list the unsupported types instead?
static bool isPushdownSupported(TypePtr inputType); | ||
|
||
/// Check whether the scalar function contains data type that doesn't to pushdown. | ||
static bool canPushdownScalarFunction( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we merge it with canPushdownFunction
which checks if a scalar function can be pushed down?
Yes, I think you are right. There is already type check. |
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
8c36b3f
to
4afd05e
Compare
Run Gluten Clickhouse CI |
4afd05e
to
a6f57c7
Compare
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
@@ -1712,19 +1733,6 @@ void SubstraitToVeloxPlanConverter::separateFilters( | |||
for (const auto& scalarFunction : scalarFunctions) { | |||
auto filterNameSpec = SubstraitParser::findFunctionSpec(functionMap_, scalarFunction.function_reference()); | |||
auto filterName = SubstraitParser::getNameBeforeDelimiter(filterNameSpec); | |||
// Add all decimal filters to remaining functions because their pushdown are not supported. | |||
if (format == dwio::common::FileFormat::ORC && scalarFunction.arguments().size() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like previous logic only excludes decimal type for ORC format, but now we are excluding HUGEINT for all file formats. Could you confirm if it is expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ORC format already supports short-decimal predicate pushdown. @jiangjiangtian Can you help test this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main idea of these three types (Timestamp/VarBinary/HugeInt
) in this PR is that if we don't put them in remainingFilter
, then when we call mapToFilters
, a exception Subfield filters creation not supported for input type '{}' in mapToFilters
will throw, which will cause scan fallback.
The reason why mapToFilters
don't support these three types is because there is no MultiRangeType
or related type traits defined, which has nothing to do with the fileformat. cc @rui-mo @jiangjiangtian @kecookier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That means actually we don't support the pushdown of long decimal in Parquet and ORC. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That means actually we don't support the pushdown of long decimal in Parquet and ORC. Is that correct?
@rui-mo Currently yes. But to support a HugeintMultiRange
in velox side is not a big issue, I can work on it later.
Run Gluten Clickhouse CI |
d796a8c
to
df3eb3f
Compare
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
return canPushdown; | ||
|
||
// Check whether data type is supported or not | ||
if (!veloxTypeList.empty() && fieldIdx < veloxTypeList.size() && !isPushdownSupported(veloxTypeList.at(fieldIdx))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a bug if 'fieldIdx >= veloxTypeList.size()'? Shall we just add a check to ensure it does not happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
This PR was auto-closed because it has been stalled for 10 days with no activity. Please feel free to reopen if it is still valid. Thanks. |
What changes were proposed in this pull request?
Prevent pushdown filters with unsupported data types to scan node in case the scan fallback to vanilla scan operator
(Fixes: #5953)
How was this patch tested?
UT